6956-6971 Nucleic Acids Research, 2014, Vol 42, No. 11 
doi: 10.1093lnarlgku372 



Published online 29 May 2014 



TET1 is a maintenance DNA demethylase that 
prevents methylation spreading in differentiated cells 

Chunlei Jin^'^'^ "*, Yue Lu^'^, Jaroslav Jelinek^'^, Shoudan Liang^, Marcos R.H. Estecio^'^'^, 
Michelle Craig Barton^ and Jean-Pierre J. Issa^'^ ' 

^The Graduate School of Biomedical Sciences, The University of Texas Health Science Center at Houston, Houston, 
TX 77030, USA, ^Department of Leukemia, The University of Texas M.D. Anderson Cancer Center, Houston, TX 
77030, USA, ^Department of Biochemistry and Molecular Biology, The University of Texas M.D. Anderson Cancer 
Center, Houston, TX 77030, USA, "^Fels Institute for Cancer Research and Molecular Biology, Temple University, 
Philadelphia, PA 19140, USA, ^Department of Molecular Carcinogenesis, The University of Texas M.D. Anderson 
Cancer Center, Houston, TX 77030, USA and ^Department of Bioinformatics and Computational Biology, The 
University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA 

Received March 18, 2014; Accepted April 16, 2014 



ABSTRACT 

TET1 is a 5-methylcytosine dioxygenase and its 
DNA demethylating activity has been implicated in 
pluripotency and reprogramming. However, the pre- 
cise role of TET1 in DNA methylation regulation out- 
side of developmental reprogramming is still unclear. 
Here, we show that overexpression of the TET1 cat- 
alytic domain but not full length TET1 (TET1-FL) in- 
duces massive global DNA demethylation in differ- 
entiated cells. Genome-wide mapping reveals that 
5-hydroxymethylcytosine production by TET1-FL is 
inhibited as DNA methylation increases, which can 
be explained by the preferential binding of TET1- 
FL to unmethylated CpG islands (CGIs) through its 
CXXC domain. TET1-FL specifically accumulates 5- 
hydroxymethylcytosine at the edges of hypomethy- 
lated CGIs, while knockdown of endogenous TET1 in- 
duces methylation spreading from methylated edges 
into hypomethylated CGIs. We also found that gene 
expression changes after TET1-FL overexpression 
are relatively small and independent of its dioxyge- 
nase function. Thus, our results identify TET1 as a 
maintenance DNA demethylase that does not pur- 
posely decrease methylation levels, but specifically 
prevents aberrant methylation spreading into CGIs 
in differentiated cells. 



INTRODUCTION 

DNA methylation at the C5 position of cytosine (5- 
methylcytosine, 5mC) is a crucial epigenetic modification 
that has been implicated in numerous cellular processes 



in mammals, including embryonic development, transcrip- 
tion, X chromosome inactivation, genomic imprinting and 
chromatin structure (1,2). The methylation pattern of the 
genome is dynamic during normal development, starting 
from fertilization through embryogenesis and postnatal 
growth, and abnormal methylation changes are involved 
in various human diseases, such as cancer (3,4). The pat- 
terns of DNA methylation in cells are initially established by 
de novo DNA methyltransferases DNMT3a and DNMT3b, 
and then faithfully maintained during DNA replication by 
the maintenance methyltransferase DNMTl (5-8). In con- 
trast to the well-defined DNA methyltransferases, the po- 
tential enzymes that erase DNA methylation are only be- 
ginning to be understood (9,10). 

The ten-eleven translocation (TET) family proteins were 
recently identified as 5mC dioxygenases which can consec- 
utively convert 5mC into 5-hydroxymethylcytosine (5hmC), 
5-formylcytosine and 5-carboxylcytosine and further in- 
duce passive or active DNA demethylation in genomic 
DNA (1 1-20). Tetl, the founding member of the TET fam- 
ily, has been intensely studied since its dioxygenase cat- 
alytic function was demonstrated (19,20). Depletion of Tetl 
in mouse embryonic stem cells (mESCs) causes decreased 
5hmC levels and increased DNA methylation at its target 
regions, and also affects gene transcription and cell lineage 
specification (19,21-25). Tetl was also reported to induce 
locus-specific demethylation in mouse primordial germ cells 
and activate some meiotic genes (26-28). Moreover, Tetl- 
mediated demethylation was observed in the reprogram- 
ming processes for generation of induced pluripotent stem 
cells (29,30). However, outside of embryonic development 
and reprogramming, little is known about the role of TET 1 
in DNA methylation regulation in differentiated cells where 
it is commonly expressed (19,31). Previous studies, which 
reported that overexpression of TET 1 in HEK293 cells can 
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induce DNA demethylation in exogenous non-replicable 
DNA reporters and endogenous genomic loci, used overex- 
pression of the TETl catalytic domain (TETl-CD) but not 
full length TETl (TETl-FL) (17,32). Given the possibility 
that the residual domains in TETl may regulate the acces- 
sibility to its catalytic domain, those results on TETl-CD 
do not precisely reveal the function of TETl in physiologic 
DNA methylation regulation. 

In this report, we systematically investigated the effect 
of TETl on DNA methylation in HEK293T cells by over- 
expression of TETl-FL and knockdown of endogenous 
TETl. Our results demonstrate that TETl works as a main- 
tenance DNA demethylase, which does not change DNA 
methylation globally, but specifically maintains the DNA 
hypomethylation state of CpG islands (CGIs) by prevent- 
ing methylation spreading from methylated edges. 

MATERIALS AND METHODS 

DNA construct and overexpression of TETl 

To clone the open reading frame (ORF) of human TETl- 
FL (1-2136 amino acid), we extracted total RNA from 
SY5Y cells using TRIzol® Reagent (Invitrogen). Reverse 
transcription was performed with a gene-specific primer (5^- 
TATATACTGCAAGTTGCTAATACTTGAATG-30 and 
AccuScript PfuUltra II RT-PCR Kit (Stratagene) accord- 
ing to the manufacturer's instructions. After PGR amplifi- 
cation with AccuPrime^^ Taq DNA Polymerase High Fi- 
delity (Invitrogen), TETl-FL ORF was cloned into pCR®- 
XL-TOPO® vector (Invitrogen). Finally, the fragment of 
TETl-FL ORF was transferred into pIRES-hrGFP II vec- 
tor (Stratagene) which contains a green fluorescent protein 
(GFP) reporter and a 3x FLAG tag. The ORF of TETl- 
CD (1418-2136 amino acid) was amplified from the above 
TETl-FL ORF clone, and it was also inserted into pIRES- 
hrGFP II vector. Catalytically mutant mTETl-FL and 
mTETl-CD (H1671D, Y1673A) (20) and CXXC domain- 
mutated TET1-FL-C594A (22) were generated by site- 
directed mutagenesis. The sequences of all plasmids were 
validated by Sanger DNA sequencing. Transfection of those 
plasmids into HEK293T cells were carried out with Fu- 
Gene HD transfection reagent (Roche) according to man- 
ufacturer's instructions. GFP-positive cells were collected 
by fluorescent activated cell sorting (FACS) at indicated 
time points using Becton Dickinson FACS Cahbur Flow 
Cytometer. HEK293T cells were obtained from ATCC, 
cultured in Dulbecco's modified Eagle's medium supple- 
mented with 10% fetal bovine serum, and tested negative 
for mycoplasma contamination. 

Western blot and DNA dot blot assays 

For western blot assays, protein extraction was performed 
using the radio -immunoprecipitation assay (RIPA) buffer 
(Fisher) supplemented with Ix protease inhibitor cock- 
tail solution (Roche). The primary antibodies used in- 
cluded anti-FLAG (Cat. #200471, Stratagene), anti-TETl 
(GTX124207, GeneTex), anti-Lamin B (AB16048, Abeam) 
and anti-p-actin (GTX109639, GeneTex). For DNA dot 
blot assays, different amounts of genomic DNA samples di- 
luted in 0.4 mM NaOH/10 mM ethylenediaminetetraacetic 



acid (EDTA) were denatured at 100°C for 10 min, foflowed 
by rapid chilling on ice. Two microliters of each denatured 
DNA was then spotted onto the positively charged nylon 
membrane (Roche), and the diameter of each dot was kept 
to <4 mm. After the membrane became dry, it was rinsed 
in 2x SSC buffer (0.3 M NaCl, 30 mM sodium citrate) fol- 
lowed by complete air dry. The dry membrane was wrapped 
in UV-transparent plastic wrap, and then placed DNA- 
side-down on a UV transilluminator for 3 min to immo- 
bilize the DNA. After blocking with 5% non-fat dry milk 
in PBS, the membrane was immunoblotted using 5hmC an- 
tibody (Cat. #39769, Active Motif) and HRP-conjugated 
anti-rabbit secondary antibody (NA934, GE Healthcare), 
and finally developed with enhanced chemiluminescence 
reagents and exposed to X-ray imaging film. 

Bisulfite-pyrosequencing and bisulfite-cloning-sequencing 

Bisulfite conversion of genomic DNA was done with Epi- 
Tect bisulfite kits (Qiagen) according to the manufacturer's 
instructions. For bisulfite-pyrosequencing, a two-step poly- 
merase chain reaction (PCR) for amplification was gener- 
ally used as previously described (33,34). The results were 
analyzed with Pyro Q-CpG Software (Qiagen) software. For 
bisulfite-cloning-sequencing, a similar two-step PCR as that 
in bisulfite-pyrosequencing was used but no biotinylated 
primers were included in the second step PCR. The final 
PCR product was then cloned into pCR4-TOPO vector (In- 
vitrogen) and transformed TOP 10 chemical competent cells 
(Invitrogen). After ~14 h of incubation at 37°C, individual 
clones were picked and amplified with PCR using pCR4 
forward primer (5^-TCTGGAATTGTGAGCGGATA-30 
and reverse primer (5^-GTTTTCCCAGTCACGACGTT- 
30- Those PCR products were then sequenced with M13- 
RV primer. The primers used are listed in Supplementary 
Table SI. 

Hpall-PCR DNA methylation assay 

Hpall-FCR DNA methylation assay was performed as pre- 
viously described (17). In brief, 500 ng genomic DNA were 
incubated with 10 units Hpall (NEB) or in a mock reac- 
tion without Hpall at 37°C for 8 h or overnight, fohowed 
by 80°C inactivation for 20 min. The DNA from Hpall di- 
gestion or mock treatment was tested by qPCR (Applied 
Biosystems 7500) using Power SYBR® Green PCR Mas- 
ter Mix (Applied Biosystems) and primers flanking specific 
digestion sites (CCGG). PCR reaction comprised a 
10 min activation step at 95°C, followed by 40 cycles of 
95°C for 15 s, and 60°C for 1 min. DNA methylation of 
a CCGG site was calculated by 2^^^^^^^) - ct(//;^«//) x 100%. 
The primers used are listed in Supplementary Table SI. 

Chromatin immunoprecipitation and quantitative PCR 
(ChlP-qPCR) 

Cells were fixed with fresh 1% formaldehyde at room tem- 
perature for 10 min and quenched with 125 mM glycine. 
The cells were then washed with cold phosphate buffered 
saline (PBS) and resuspended in sodium dodecyl sulphate 
(SDS) lysis buffer (50 mM Tris pH 8.1, 10 mM EDTA, 
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1% SDS), followed by sonication with Bioruptor (Diagen- 
ode) to get an average fragment size of 200-500 bp. The 
resultant chromatin samples were 10-fold diluted in ChIP 
dilution buffer (16.7 mM Tris pH 8.1, 1.2 mM EDTA, 
167 mM NaCl, 1.1% Triton X-100, 0.01% SDS) and incu- 
bated with Dynal® Protein G magnetic beads (Invitrogen) 
at 4°C overnight for pre-clearing. At the same time, anti- 
FLAG or anti-TETl antibodies described above or control 
IgG were pre-crosslinked with Dynal®Protein G magnetic 
beads at 4°C overnight. The pre-cleared chromatin samples 
were then incubated with the antibody-bead complex at 4°C 
overnight. The immunoprecipitated chromatin-antibody- 
bead complexes were extensively washed with RIPA wash- 
ing buffer (0.5 M EDTA, 5 M LiCl, 1 M HEPES-KOH 
pH 7.6, 10% NP-40, 10% Na-deoxycholate) and TE buffer 
(pH 8.0) containing 50 nM NaCl, resuspended in elution 
buffer (50 mM Tris pH 8.1, 10 mM EDTA, 1% SDS), and 
heated at 65° C for 1 5 min to separate chromatin from beads. 
For reversing crosslinks, the isolated chromatin was incu- 
bated at 65° C overnight, followed by digestion with RNase 
A and Proteinase K. Finally, the resultant DNA was puri- 
fied with Qiaquick PGR Purification Kit (Qiagen). Enrich- 
ment of target regions was determined by qPCR under sim- 
ilar conditions described in Hpall-FCR DNA methylation 
assay. The primers used are listed in Supplementary Table 
SI. 

Lentiviral shRNA-mediated TETl knockdown 

Two different TETl shRNAs in the pTRIPZ vec- 
tors (shTETl#l: V2THS_141063 and shTETl#2: 
V2THS_203196, Open Biosystems) were transferred 
into Mull and Xhol sites of the pGIPZ vectors (Open 
Biosystems). A non-targeting shRNAmir-pGIPZ vector 
was used as a negative control (RHS4743, Open Biosys- 
tems). To produce lentiviral particles, pGIPZ-shTETl 
and package plasmids psPAX2 and pMD2.G (Addgene) 
were transfected into HEK293FT cells (Invitrogen) at 
a ratio of 1:1:1 using Lipofectamine®2000 Transfection 
Reagents (Invitrogen). Two days after transfection, the 
viral supernatant was collected and filtered with 0.45 |xm 
filters (Millipore). HEK293T cells were then infected with 
each lentivirus supernatant in the presence of 8 |xg/ml 
of polybrene (Sigma). Puromycin selection (1.5 |xg/ml, 
Sigma) began 2 days after infection. Stable knockdown 
cells were cloned by limiting dilution and selected by 
western blot assay based on TETl protein level. These 
knockdown cells were subjected to further analyses 3-4 
months after transfection. 

Digital restriction enzyme analysis of methylation (DREAM) 

DREAM was performed as previously described (35). 
Five micrograms of genomic DNA from (m)TETl-FL-or 
(m)TETl-CD-overexpressing HEK293T cells were first 
spiked with 0.05 ng of a set of specific calibrators with 
different methylation levels. The DNA mixture was then 
sequentially digested by 5 |jl1 Smal (3 h at 37°C, Fermentas) 
and 50 U Xmal (->16 h at 37°C, NEB), resulting in distinct 
DNA signatures at unmethylated or methylated Smal sites 
(CCCGGG). After purification with a QIAquick PGR 



purification kit, the digested DNA was heated at 65°G for 
3 min followed by snap cooling to create free concatenated 
GGGG overhangs. Klenow fragment (3^^ 5^ exo-) (NEB) 
and GGA mix (dGTP, dGFP, dATP, 10 mM each) were 
then added to fill the overhangs and added A' tail to 3^ end. 
The resultant DNA was purified again and then ligated 
with Illumina paired ends adapters (PEAl: 5^-phosphate- 
GATGGGAAGAGGGGTTGAGGAGGAATGGGGAG- 
y, PEA2: 5^-AGAGTGTTTGGGTAGAGGAGGGTGTTG- 
GGATGT-30 using Quick T4 DNA ligase (Enzymatics). 
Subsequently, the DNA ligated with adapters was sepa- 
rated through 2% agarose gel. The gel slice with the size of 
250-375 bp was cut off and purified with QIAquick Gel 
Extraction Kit (Qiagen). PGR amplification (18 cycles) 
of the gel-extracted DNA was performed using Illumina 
paired end PGR primers and iProof HF master mix 
(Bio-Rad). Resulting sequencing hbraries were purified 
with Agencourt AMPure PGR Purification Kit (Beckman 
Goulter). Then the libraries were sequenced by paired-end 
36 nt sequencing on Illumina Genome Analyzer II. The 
sequencing reads were mapped to Smal sites in the human 
genome hgl8 and signatures corresponding to methylated 
and unmethylated GpG were enumerated for each Smal 
site. The minimum coverage was set at >20 reads unless 
otherwise indicated. The methylation value was calculated 
as the ratio of the number of methylated tags over total 
number of tags mapped to a given Smal/Xmal site. Data 
have been deposited in GEO with accession number: 
GSE44038. 

As for the genomic location of Smal sites, each Smal site 
was assigned to the gene that has the closest transcription 
start site (TSS). Then the region was classified by its loca- 
tion to the gene: upstream (—5 to — 1 kb from TSS), pro- 
moter (—1 to 0.5 kb from TSS), exon, intron, downstream 
(—0.5 to 1 kb from transcription end site (TES)) and in- 
tergenic (1 kb from TES to — 5 kb of downstream gene). 
The gene list used to annotate the enriched regions is the 
RefSeq gene list downloaded from UGSG genome browser 
(http://genome.ucsc.edu/) in April 2010. The GGI annota- 
tion was also obtained from the UGSG website. 

To analyze the distribution of DNA methylation on gene 
bodies, each Smal site was classified into bins according to 
its relative location to the closest gene. Five kilobase up- 
stream of TSS and 5 kb downstream of TES were subdi- 
vided into 20 bins, with 500 bp for each bin. The gene body 
was subdivided evenly into 20 bins. The average percentage 
of methylation for all the Smal sites in each bin was calcu- 
lated. 

Hydroxymethylated DNA immunoprecipitation combined 
with next generation DNA sequencing (hMeDIP-seq) 

Three micrograms of genomic DNA from (m)TETl-FL 
or (m)TETl-GD-overexpressing HEK293T cells were di- 
luted in TE buffer (pH 7.6) and sonicated with Bioruptor. 
The desirable fragment size was 100-500 bp. The resultant 
DNA was purified with QIAquick PGR purification kit. 
Five hundred nanograms purified sonicated DNA was used 
and spiked with 20 pg of a set of specific calibrators con- 
taining different 5hmG levels. End repair, addition with A' 
bases to the 3^ end of the DNA fragments and ligation with 
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Illumina paired ends adapters were then performed simi- 
lar to the procedure described in DREAM. The resultant 
DNA was purified again, diluted in 450 |jl1 TE buffer and 
denatured at 95°C for 10 min, followed by snap chilling 
in ice. The denatured DNA was subsequently mixed with 
50 |jl1 lOx IP buffer (1.4 M NaCl, 100 mM Na-phosphate, 
pH 7.0, 0.5% Triton X-100), followed by incubation with 1 
|xl anti-5hmC antibody described above at 4°C overnight. 
Then Dynal® Protein G magnetic beads were added and in- 
cubated at 4°C for 2 h. After extensive washing with 1 x 
IP buffer (140 mM NaCl, 10 mM Na-phosphate, pH 7.0, 
0.05% Triton X-100), DNA was separated from beads at 
65°C for 15 min in elution buffer (10 mM Tris pH 7.6, 1 
mM EDTA, 1% SDS). The eluted products were size se- 
lected by electrophoresis in a 2% agarose gel. A slice cor- 
responding to 300 lb 25 bp size window based on DNA 
ladder was cut out. The DNA extracted from agarose was 
amplified with PCR (10-15 cycles) by using Illumina paired 
end PCR primers and Phusion^^ High-Fidelity DNA Poly- 
merase (NEB). The resultant sequencing libraries were pu- 
rified with Agencourt AMPure PCR Purification Kit and 
sent for single-read sequencing on Illumina Genome An- 
alyzer II. Sequenced DNA tags were mapped to human 
genome hgl 8 and uniquely mapped tags were kept. For mul- 
tiple tags that were mapped to the same genomic location, 
only one was considered in the analysis to avoid PCR bias. 
CCAT (version 3.0) (36) was used to detect 5hmC peaks 
in hMeDIP-Seq samples. The window size was set as 500 
bp. Peaks with FDR <0.05 and >5 fold enrichment to in- 
put were deemed as significant. Each peak was assigned to 
the gene that has the closest TSS as Smal sites described 
above. As for landscape of the data, each tag was extended 
by 250 bp to its y end. Then the number of overlapped tags 
in each genome position was rescaled to normalize the num- 
ber of background tags to 10 M and averaged over 10 bp 
resolution. The averaged values were displayed using UCSC 
genome browser (http://genome.ucsc.edu/). Data have been 
deposited in GEO with accession number: GSE44036. 

The distribution of hMeDIP signal on different genomic 
features was analyzed as follows. Gene body: for each gene, 
5 kb upstream of TSS and 5 kb downstream of TES were 
subdivided into 1 kb bins, and the gene body was subdi- 
vided evenly into 20 bins. TSS: 10 kb upstream to 10 kb 
downstream of the TSS of each gene was subdivided into 
250 bp bins. Exon: 200% upstream to 200% downstream of 
each exon was subdivided into 25 bins, with each bin 20% 
of the exon length. Exon-intron boundary: at each exon- 
intron boundary, 100 bp into exon and 100 bp into intron 
were subdivided into 25 bp bins. CGI: 10 kb upstream to 
10 kb downstream of each CGI was subdivided into 50 
bins, consisting of 20 bins for upstream or downstream (500 
bp each) and 10 bins for CGI (10% of the length of CGIs 
each). In each bin, the number of tags was normalized by 
the length of the bin and the total number of background 
tags in the genome (the number of background tags was cal- 
culated as total number of tags multiplied by the noise rate 
from CCAT). The normalized tag density of the hMeDIP 
sample in each bin was then subtracted by that of the corre- 
sponding input and averaged over all the respective features 
in the genome. 



RNA isolation and reverse transcription-qPCR analysis 

Total RNA was isolated using TRIzol Reagent as per manu- 
facturer's specifications. cDNA was synthesized from 1 fxg 
of DNase-treated total RNA using High-Capacity cDNA 
reverse transcription kit (Applied Biosy stems). qPCR was 
performed under similar conditions described in Hpall- 
PCR DNA methylation assay. The average threshold (Ct) 
was determined for each gene and normalized to (B-actin 
as an internal normalization control. The primers used are 
hsted in Supplementary Table SI. 

Whole-genome gene expression microarray analysis 

Affymetrix GeneChip® Whole-Transcript Human Gene 
2.0 ST Arrays were used for global gene expression analy- 
sis. RNA samples were labeled and hybridized according to 
the manufacturer's instructions (Affymetrix, Santa Clara, 
CA, USA). Scanned microarray images were analyzed us- 
ing the Affymetrix Gene Expression Console with RMA 
(Robust Multi-array Average) normalization algorithm. A 
criterion of 1.5-fold expression change was used to identify 
differentially regulated genes. Hierarchical clustering was 
performed on significant genes using signal intensities after 
RMA normalization by hclust function in R. Signal inten- 
sity values were rescaled to z-scores by row before cluster- 
ing. Heatmap was generated by heatmap.2 function in R. 
Data have been deposited in GEO with accession number: 
GSE50016. 

RNA-seq 

RNA-seq sequencing libraries were made from 100 ng of 
DNase-treated total RNA samples using Encore® Com- 
plete RNA-Seq Library System (NuGEN, San Carlos, CA, 
USA) following the manufacturer's protocol. The libraries 
were sequenced using a 2 x 100 bases paired end pro- 
tocol on the Illumina HiSeq 2000 instrument at the Fox 
Chase Cancer Center (Philadelphia, PA, USA). Each h- 
brary was sequenced in a single lane, generating 188-241 
million reads per sample. The reads were mapped to hu- 
man genome (hgl 9) by TopHat (V2.0.5) (37). The number 
of fragments in each known gene from the RefSeq database 
(downloaded from UCSC Genome Browser on 9 March 
2012) was enumerated using htseq-count from HTSeq pack- 
age (V0.5.3p9). The differential expression between sam- 
ples was statistically accessed by R/Bioconductor package 
edgeR (V3.0.8) (38) and DESeq (VI. 10.1) (39), using the 
two most similar samples (mTETl-FL and TETl-FL) to 
estimate the biological variation. Genes with FDR >0.05 
by both edgeR and DESeq and fold change >2 were called 
significant. 

For the fragments that have both ends mapped, the first 
reads were kept. Together with the reads from the fragments 
that have only one end mapped, every read was extended 
to its y end by 200 bp in exon regions. For each read, a 
weight of 1 /n was assigned, where n is the number of po- 
sitions the read was mapped to. The sum of weights for all 
the reads that cover each genomic position was rescaled to 
normalize the total number of fragments to 1 M and aver- 
aged over 10 bp resolution. The averaged values were dis- 
played using UCSC genome browser (http://genome.ucsc. 
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edu/). Data have been deposited in GEO with accession 
number: GSE49833. 

Hierarchical clustering was performed on significant 
genes from any of the four comparisons (mTETl-CD, 
TETl-CD, mTETl-FL, TETl-FL versus control) using 
their FPKM values (fragments per kilobase of exon per mil- 
lion fragments mapped) by hclust function in R. FPKM 
values were rescaled to z-scores by row before clustering. 
Heatmap was generated by heatmap.2 function in R. 

RESULTS 

TETl-FL and TETl-CD display differential ability for 
5hmC production and DNA demethylation 

Overexpression of TETl-CD induces significant DNA 
demethylation in selected genomic loci in HEK293 cells 
(17,32) but it is not known if this is true for TETl-FL. To 
test this, HEK293T cells were transiently transfected with 
TETl-FL or TETl-CD expression plasmids and harvested 
for expressing cells by OFF sorting (Figure lA and B and 
Supplementary Figure SIA-C). DNA dot-blot assays con- 
firmed a dramatic production of 5hmC in cells transfected 
with wild type TETl-FL or TETl-CD but not those trans- 
fected with catalytically mutant TETl-FL or TETl-CD (re- 
ferred to as mTETl-FL and mTETl-CD, respectively) 3 
days after transfection (Supplementary Figure SID). How- 
ever, compared with TETl-CD, TETl-FL showed a much 
lower 5hmC production (-1/8 of that by TETl-CD) (Sup- 
plementary Figure SID), which may be attributable to its 
lower expression level (Figure IB) and/or its possible lower 
inherent efficiency for 5hmC production in the genome. We 
thus increased the expression time of TETl-FL as a com- 
pensation for its low expression level, and found > 2-fold 
increase of 5hmC content from 3 to 5 days after transfec- 
tion but a slight increase from 5 to 7 days after transfec- 
tion (Figure IC). Moreover, given that some 5hmC would 
be further converted to other types of cy to sine (including 
unmodified cytosine) during the prolonged expression time, 
the actual 5hmC production by TETl-FL 7 days after trans- 
fection should be higher than what we have seen in DNA 
dot-blot assay results (Figure IC). Thus, to compare the ef- 
fects of TETl-CD and TETl-FL on 5hmC production and 
potential DNA demethylation in subsequent experiments, 
we always used the TETl-CD or TETl-FL-overexpressing 
cells which were collected 3 or 7 days after transfection, re- 
spectively, unless indicated. 

We next examined whether TETl -mediated 5mC oxida- 
tion leads to DNA demethylation of endogenous genomic 
DNA. We initially used bisulfite-pyro sequencing for quan- 
titative analysis of DNA methylation. Although the oxi- 
dated products of TETl can interfere with bisulfite analysis 
in that 5hmC reacts the same as 5mC while 5fC and 5caC 
react as unmodified cytosine (15,40,41), the abundance of 
these modified bases is typically one to several orders of 
magnitude lower than that of 5mC (14), suggesting that 
their impact on the final results would be relatively low. 
We found that overexpression of TETl-CD induced sig- 
nificant demethylation at randomly selected methylated ge- 
nomic loci, including long interspersed nucleotide element- 
1 (LINE-1) and promoters of RASSFl a, 0CT4 and PGR 
genes (Figure ID). In marked contrast, TETl-FL had no 



measurable effect on DNA methylation in this assay. Since 
digestion is blocked by 5mC and its oxidative deriva- 
tives ( 1 5, 1 7), we also used Hpall-FCR based DNA methyla- 
tion assays to confirm these results. Here again, only TETl- 
CD overexpression induced a significant decrease of DNA 
methylation at the promoters of INVSIABP, NPAS3 and 
PARES 1 (Figure IE). Taken together, the above results at 
least suggest that TETl-FL and TETl-CD have differential 
ability for 5hmC production and DNA demethylation. 

Overexpression of TETl-CD but not TETl-FL induces 
global DNA demethylation 

To further characterize the effects of TETl-FL and TETl- 
CD on DNA methylation, we examined global DNA 
methylation changes in transfected HEK293T cells by us- 
ing DREAM. This method provides quantitative analysis 
of DNA methylation with high accuracy (35). DNA methy- 
lation levels of 34,322; 33,395; 43,936 and 42,988 CpG 
sites were quantified in cells transfected with TETl-CD, 
mTETl-CD, TETl-FL and mTETl-FL, respectively. Pair- 
wise comparison of 32,803 common CpGs between TETl- 
CD and mTETl-CD transfections revealed a massive DNA 
demethylation induced by overexpression of TETl-CD 
(Figure 2A and B), with 2,957 CpGs (-8.6%) demethylated 
by >20% (Figure 2A). These DNA demethylation events 
were the same in CGI and non-CGI DNA, upstream, pro- 
moter, exons, introns, downstream and intergenic regions, 
indicating that TETl-CD induces a genome-wide DNA 
demethylation without distribution bias (Figure 2C and D 
and Supplementary Figure S2A). Very different results were 
seen for TETl-FL. Pair- wise comparison of 40,937 com- 
mon CpGs between TETl-FL and mTETl-FL transfec- 
tions showed no significant DNA demethylation induced 
by TETl-FL overexpression (Figure 2E and F). Given 
that Tetl specifically binds CpG-rich regions in mESCs 
(21,22,25), we then asked whether TETl-FL selectively in- 
duces DNA demethylation in CGIs. However, neither CGIs 
nor non-CGIs showed significant DNA demethylation in- 
duced by TETl-FL overexpression (Figure 2G and H). Sim- 
ilarly, further analyses of upstream, promoter, exons, in- 
trons, downstream and intergenic regions also did not find 
any significant demethylation after TETl-FL overexpres- 
sion (Supplementary Figure S2B). Therefore, unlike TETl- 
CD, overexpression of TETl-FL in HEK293T cells cannot 
induce significant DNA demethylation genome-wide, indi- 
cating that TETl is not as efficient a DNA demethylase as 
previously predicted (17,32). 

TETl-FL and TETl-CD differentially regulate 5hmC distri- 
bution patterns 

As the primary product of TET-catalyzed 5mC oxidation 
reaction, 5hmC also serves as a critical intermediate for 
TET-induced DNA demethylation (13,15,17,18). We there- 
fore asked whether the differences in demethylation induc- 
tion between TETl-CD and TETl-FL were due to their 
different regulation of 5hmC distribution. We performed 
genome- wide mapping of 5hmC in HEK293T cells with 
5hmC antibody-based hMeDIP-seq (Figure 3A and Sup- 
plementary Figure S3). Once again, we used as controls 
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mTETI-CD and mTETl-FL, which cannot catalyze 5mC 
oxidation. In these two control transfections, a similar num- 
ber of 5hmC peaks and an almost identical distribution 
pattern were detected (Figure 3B) with 5hmC enrichment 
around promoters but even distribution in exons and in- 
trons at a relatively low level (Figure 3C and Supplemen- 
tary Figure S4A and B). The enrichment of 5hmC around 
TSSs is similar to what was previously observed in mESCs 
(25,42). Interestingly, 5hmC density showed a dip in CGI- 
overlapped TSSs but peaked at non-CGI-overlapped TSSs 
(Figure 3D and E). 

We next examined the 5hmC distributions in TETl- 
CD- and TETl-FL-overexpressing cells. Consistent with 
the DNA dot blot assays (Figure IC), TETl-FL markedly 
increased the number of 5hmC peaks (from 75 482 to 
111 648), with an even greater increase in TETl-CD- 
overexpressing cells (from 61 095 to 314 557) (Figure 3B). 
Importantly, TETl-FL and TETl -CD showed very dif- 



ferent 5hmC distribution patterns across gene bodies and 
around TSSs. While TETl -CD markedly increased 5hmC 
in gene bodies with higher enrichment in exons than in 
introns, TETl-FL resembled the distribution of 5hmC in 
control cells with the highest levels just upstream to TSSs 
(Figure 3C and Supplementary Figure S4A and B). The 
contrast between TETl -CD and TETl-FL was more ap- 
parent around CGI-overlapped TSSs. In both cases, 5hmC 
was depleted just at TSSs (Figure 3D). However, TETl- 
CD induced equal accumulation of 5hmC outside of TSSs, 
while TETl-FL formed two dramatic 5hmC peaks flanking 
TSSs (Figure 3D). By contrast, the 5hmC increase induced 
by TETl-FL was much lower around non-CGI-overlapped 
TSSs (Figure 3E), suggesting that TETl-FL preferentially 
functions in CGIs. 

We next directly examined the 5hmC distribution across 
CGIs. Interestingly, the TETl-FL-induced 5hmC peaks de- 
scribed above (Figure 3D) precisely resided at the edges 
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of promoter CGIs, and the increased 5hmC by TETl-CD 
also spread evenly outside of those CGIs (Figure 3F). How- 
ever, other CGIs (in upstream, exons, introns, downstream 
and intergenic regions) showed much less 5hmC accumula- 
tion by TETl-FL at their edges and significantly increased 



5hmC by TETl-CD over their bodies (Supplementary 
Figure S5A-E). Contrary to promoter CGIs which were 
predominantly unmethylated, most non-promoter CGIs 
were moderately or highly methylated (Supplementary Fig- 
ure S6A and B). We thus hypothesized that those differ- 
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ent 5hmC distribution patterns across promoter and non- 
promoter CGIs are associated with their different basal 
methylation levels. As expected, hypomethylated (methy- 
lation < 10%) non-promoter CGIs exhibited highly sim- 
ilar 5hmC distribution patterns as promoter CGIs (Fig- 
ure 3F and G), while methylated (methylation > 50%) 
non-promoter CGIs showed extremely increased 5hmC by 
TETl-CD but much less 5hmC enrichment at CGI edges 
by TETl-FL (Figure 3H). Thus, these results indicate that 
TETl-FL and TETl-CD differentially regulate 5hmC dis- 
tribution in HEK293T cells, with a marked preference of 
TETl-FL toward the edges of hypomethylated CGIs, while 
TETl-CD appears to monotonously increase 5hmC levels. 

DNA hypermethylation inhibits 5hmC production by TETl- 
FL 

The requirement for pre-existing 5mC in TET-catalyzed 
5hmC production implies a positive correlation between 
5hmC and 5mC distributions in genomic DNA. However, 
5hmC and 5mC actually have distinct genomic distribu- 
tions in mESCs (22,23,25,42). For example, contrary to 
5mC, 5hmC is significantly enriched around TSSs but gen- 
erally not detectable at repetitive elements and minor satel- 
lite repeats in mESCs (25). To study this issue, we combined 
our global DNA methylation and 5hmC distribution data 
and analyzed the correlation between 5mC and 5hmC lev- 
els in TETl-CD- and TETl-FL-overexpressing cells. The 
5hmC tag density in TETl-CD-overexpressing cells posi- 
tively correlated with basal DNA methylation (Figure 4A; 
Pearson r = 0.89, P = 0.0005). By contrast, 5hmC tag den- 
sity remained at a relatively constant level regardless of 
basal methylation in control cells and also in TETl-FL- 
overexpressing cells, suggesting that high levels of 5mC ac- 
tually inhibit TETl-FL catalytic function (Figure 4A). We 
next profiled 5hmC tag density together with DNA methy- 
lation levels across gene bodies. The basal DNA methyla- 
tion levels in control cells were lowest at TSSs and grad- 
ually increased along gene bodies, followed by a dramatic 
drop around TESs (Figure 4B and C). The 5hmC distri- 
bution profile in TETl-CD-overexpressing cells highly re- 
sembled the basal DNA methylation patterns, further con- 
firming the positive correlation between 5mC and 5hmC 
in the setting of TETl-CD overexpression (Figure 4B). In- 
deed, as the 5hmC density increased toward TESs, the ex- 
tent of TETl -CD-induced DNA demethylation also in- 
creased, demonstrating the requirement of high 5hmC pro- 
duction for significant DNA demethylation by TET-CD. By 
contrast, TETl-FL overexpression failed to produce more 
5hmC as basal DNA methylation increased along gene bod- 
ies and also failed to induce DNA demethylation (Fig- 
ure 4C). Therefore, this divergent distribution of 5hmC by 
TETl-CD and TETl-FL likely explains their differences in 
inducing DNA demethylation. 

TETl specifically binds unmethylated CGIs through its 
CXXC domain 

Considering that a high 5hmC yield depends not only on 
a high level of substrate 5mC but also on an enriched 
amount of TETl binding, we hypothesized that the grad- 
ual loss of TETl-FL binding as DNA methylation levels 



increase may underlie its inability to demethylate. Indeed, 
Tetl -bound CGIs have been reported to be associated with 
lower 5mC levels compared to the CGIs not bound by Tetl 
in mESCs (21). We next compared TETl-FL occupancy 
at eight unmethylated (BCL2L11, PACSl, PSEN2 and 
TTC9) or hypermethylated (BHLHA9, LRRC56, OPALH 
and SFMBTl) promoter CGIs (methylation levels shown 
in Supplementary Figure S7A) by ChlP-qPCR. TETl- 
FL was highly enriched at unmethylated CGI promot- 
ers but dramatically excluded from hypermethylated ones 
(Figure 4D). This was also true for endogenous TETl in 
HEK293T cells (Supplementary Figure STB). By contrast, 
through an unknown mechanism, TETl-CD, which lacks 
the CXXC domain previously hnked to specific binding to 
CGIs (22), was extensively bound to both kinds of CGIs, 
with a preference for hypermethylated CGIs (Figure 4D). 
The CXXC domain-mutated TET1-FL-C594A completely 
lost enrichment at unmethylated CGI promoters, confirm- 
ing the CXXC domain-dependent binding of TETl to ge- 
nomic DNA (Figure 4E). As a result of the loss of DNA 
binding, a much lower 5hmC yield was detected by overex- 
pression of TET1-FL-C594A compared to TETl-FL (Sup- 
plementary Figure S8). Taken together, our data suggest 
that the preferential binding of TETl-FL to unmethylated 
CGIs through its CXXC domain essentially limits its 5hmC 
production (compared to TETl-CD, seen in Figure IC) and 
consequently leads to its failure to induce significant DNA 
demethylation (Figure 2). 

TETl-FL decreases DNA methylation levels in sparsely 
methylated CGIs 

The preferential binding of TETl-FL to unmethylated 
CGIs further triggered us to ask whether TETl-FL selec- 
tively induces demethylation in hypomethylated CGIs. Al- 
though generally referred to as 'unmethylated', many CGIs 
actually have low levels of DNA methylation when care- 
fully examined by quantitative (and sensitive) methods. We 
therefore re-analyzed the DREAM results of (m)TETl-FL 
transfections with a focus on hypomethylated CGI sites for 
which very precise methylation data were available by virtue 
of having a high level of sequence coverage (>100 tags). 
Around 1,100 CpG sites were identified with measurable 
low methylation levels (1-20%) and divided into four groups 
with basal methylation of 1-5, 5-10, 10-15 and 15-20%, re- 
spectively. Using pair- wise analysis, we found that TETl- 
FL overexpression significantly decreased methylation in 
sites with 1-5% (P < 0.0001) and 5-10% (P < 0.0001) basal 
methylation, but not those with 10-15 or 15-20% basal 
methylation (Figure 5A-D). Therefore, TETl-FL exhibits a 
specific demethylating activity in sparsely methylated CGIs. 

TETl prevents DNA methylation spreading into CGIs 

In contrast to the sporadically distributed CpGs that are 
heavily methylated, CGIs are typically unmethylated in 
mammalian genome (2). The above results that TETl-FL 
induces striking 5hmC accumulation at the edges of hy- 
pomethylated CGIs and specific demethylation in sparsely 
methylated CGIs in HEK293T cells, suggested that TETl 
functions primarily to prevent methylation of CGIs rather 
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than dynamically switch methylation states in differenti- 
ated cells. To test this directly, we next established shRNA- 
mediated TETl knockdown also in HEK293T cells because 
of their relatively high endogenous TETl expression (Sup- 
plementary Figure S9). Knockdown of TETl in two inde- 
pendent clones did not change cell morphology but signifi- 
cantly inhibited cell growth (Figure 6A and Supplementary 
Figure SIO) as reported in NIH3T3 cells (43). It also ex- 
pectedly decreased genomic 5hmC content (Figure 6B), as 
well as the enrichment of TETl at the unmethylated pro- 
moter CGIs ofBCL2Lll, PACSl, PSEN2 and TTC9 (Fig- 
ure 6C and Supplementary Figure SUA). We then studied 
DNA methylation at both upstream edges and central re- 
gions of these four CGIs by bisulfite-cloning-sequencing. 
Among the eight tested regions in control cells, the edge 



of BCL2L11 was completely methylated, that of PACSl 
showed partial methylation and the other regions were al- 
most unmethylated (Figure 6D and Supplementary Figure 
SllB-D). TETl knockdown induced a significant increase 
of methylation only at the edge of the PACSl CGI with 
no measurable changes in the other tested regions (Figure 
6D and Supplementary Figure SllB-D), suggesting that 
TETl mainly regulates DNA methylation at the boundary 
between methylated and unmethylated CpG sites close to 
CGIs. Given that pre-existing DNA methylation could serve 
as a seed for methylation spreading into nearby unmethy- 
lated regions (44,45) and the specific accumulation of 5hmC 
at CGI edges by TETl-FL, it seems plausible that by its 
5mC dioxygenase function TETl blocks DNA methylation 
spreading in this context. 
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paired signed-rank test. 



To confirm these data, we randomly selected four other 
unmethylated promoter CGIs (KAZN, MUMl, RFX6 and 
VAX2) that had methylated edges based on DREAM re- 
sults (Supplementary Table S2). The specific binding of 
TETl and the methylated edges were further vahdated for 
each CGI (Figure 6C and D). Consistent with the find- 
ings in the PACSl CGI, the methylated edges of these 
CGIs also showed significant methylation spreading in both 
TETl knockdown clones (Figure 6D). Thus, TETl knock- 
down resulted in DNA methylation spreading in all five 
CGIs specifically at their methylated edges, consistent with 
the hypothesis that TETl binds to hypomethylated CGIs 
(through its CXXC domain) and functions as a 'mainte- 
nance' demethylase that inhibits the spreading of de novo 
DNA methylation from methylated CGI edges. 



Effects of TETl on gene transcription 

DNA methylation of CGI promoters is associated with re- 
pressed gene transcription (2,46), and TETl has previously 
been reported to affect gene expression (21,22,25,28). We 
therefore asked whether TETl is required for the active 
transcription of target genes by preventing de novo DNA 
methylation spreading into the CGI promoters. We first an- 
alyzed the effect of TETl knockdown on the expression 
of target genes previously analyzed. TETl knockdown re- 
duced the expression of only three out of five genes for 
which we confirmed DNA methylation spreading into their 
promoter CGIs (Figure 7A), and no or inconsistent expres- 
sion changes of other three genes that had unchanged DNA 
methylation at their promoter CGIs (Figure 7B). We next 
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*P < 0.05, **p < 0.01 by Student's /-test compared to shControl clone cells. (D) Bisulfite- sequencing analysis shows increased DNA methylation at the 
methylated edges of unmethylated CGIs after TETl knockdown. The top panel contains diagrams of PACSl, KAZN, MUMl, RFX6 and VAX2 CGI 
promoters. Horizontal green bars represent CGIs. Red bars show the location of boundary amplicons studied by bisulfite-sequencing. In the lower panel, 
each line represents a different cloned sequence, with black squares representing methylated CpG sites. The average methylation is shown below each panel 
as mean ± SD {n = 3). In all cases, boundary methylation increased after TETl knockdown. 



used cDNA microarrays to analyze whole genome gene ex- 
pression changes after TETl knockdown. Using a criterion 
of 1.5-fold expression change, 89 upregulated and 97 down- 
regulated genes were identified in both TETl knockdown 
cell clones (Supplementary Figure S12A), and some of them 
were further validated by RT-qPCR (Supplementary Figure 
S12B and C). Three of these vahdated downregulated genes 
were further proved to be TETl target genes but did not 
consistently gain DNA methylation spreading in their CGI 
promoters after TETl knockdown (Supplementary Figure 
SI 2D and E). Thus, consistent with previous results indicat- 
ing that depletion of Tetl induces similar gene expression 
changes in wild type and Dnmt TKO mESCs (25), our data 
suggest relatively minor effects of TETl on gene transcrip- 
tion in HEK293T cells that are both DNA methylation- 
dependent and -independent. 

To more directly determine whether TETl functions 
in gene transcription regulation, we undertook RNA-seq 
analysis of HEK293T cells after overexpression of TETl - 
FL or mTETl-FL. As expected, TETl-FL- and mTETl- 
FL-overexpressing cells showed an extremely high mRNA 
level of TETl (~ 1 50-fold increase compared with the vector 
control. Figure 7C). Strikingly, TETl-FL and mTETl-FL 
had a nearly identical gene expression profile, but both were 
significantly different from the control transfections (Figure 
7D). Consistent with this, when we computed differentially 
expressed genes (>2-fold change, FDR < 0.05) in TETl-FL 
and mTETl-FL compared with control, we found a signif- 
icant overlap and a strong positive correlation between the 
gene sets (Figure 7F and G). Thus, overexpression of TETl- 
FL induces significant expression changes despite its limited 
ability to increase 5hmC, and these changes are also seen 
with an enzymatically dead mutant TETl vector (mTETl- 



FL). Our data do not imply a direct contribution of TETl 
to gene expression and suggest that the effects observed are 
independent of its catalytic activity and of its demethylating 
activity. 

Very similar results were seen with TETl -CD. Both 
TETl -CD and mTETl-CD overexpression changed gene 
expression profiles compared with the control, even though 
only TETl -CD induced massive DNA demethylation (Fig- 
ure 7C and E). TETl-CD and mTETl-CD also had very 
similar gene expression profiles (Figure 7E) despite the mu- 
tation in the catalytic domain of mTETl-CD, and their dif- 
ferentially expressed genes compared to controls also sig- 
nificantly overlapped and positively correlated (Figure 7F 
and G). The differentially expressed genes of (m)TETl-CD 
overexpression substantially overlapped and positively cor- 
related with those of (m)TETl-FL (Figure 7F and G), sug- 
gesting that the catalytic domain is still important to the 
observed gene expression differences, even though this ef- 
fect is independent of its abihty to catalyze 5hmC formation 
and induce demethylation. The above RNA-seq data were 
also confirmed by gene expression microarrays with similar 
results (Supplementary Figure S13A and B). Thus, these re- 
sults conclusively establish that the gene expression changes 
seen after TETl overexpression or depletion are separate 
from and independent of its demethylating activity. 

DISCUSSION 

In this paper, we demonstrate that TETl-FL overexpression 
has minimal effects on global DNA methylation. Rather, 
TETl is a critical component of methylation boundaries 
and serves as a maintenance DNA demethylase that pre- 
vents aberrant methylation spreading into hypomethylated 
CGIs. We also provide evidence that the effects of TETl on 
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Figure 7. TETl regulates gene transcription independent of its demethylating activity. (A and B) Effect of TETl knockdown on the expressions of its 
target genes with (A) or without (B) increased DNA methylation spreading into their CGI promotes. Data represent mean ± SD (ti = 3). *P < 0.05, **P 
< 0.01 by Student's /-test compared to shControl cells. (C) RNA-seq transcriptional landscapes of the TETl gene in the vector control and (m)TETl- 
FL- or (m)TETl-CD-overexpressing HEK293T cells. The two introduced substitution mutations in mTETl-FL and mTETl-CD were also validated 
by RNA-seq. (D and E) Scatter plots comparing the RNA-seq-derived gene expression profiles among the vector control, TETl-FL and mTETl-FL 
overexpressions (D), and among the vector control, TETl -CD and mTETl-CD overexpressions (E). Pearson correlation coefficients, r, are listed in each 
graph. (F) Venn diagram showing the overlap of differentially expressed genes (>2-fold change, FDR < 0.05) of TETl-FL, mTETl-FL, TETl-CD and 
mTETl-CD overexpressions. (G) Heatmap showing hierarchical clustering of differentially expressed genes of TETl-FL, mTETl-FL, TETl-CD and 
mTETl-CD overexpressions. All cells were collected by FACS 3 days after transfection. 



gene transcription observed after overexpression are unre- 
lated to its 5mC dioxygenase function. 

Our results are different from previous studies which in- 
vestigated the DNA demethylating effects of TETl using 
overexpression of a truncated catalytic domain-only version 
of TETl (TETl-CD) (17,32). We found that TETl-CD but 
not TETl-FL induces genome-wide demethylation. This 



difference cannot be simply the result of the lower TETl- 
FL expression because TETl-FL and TETl-CD had simi- 
lar mRNA levels by RNA-seq (Figure 7C) and the appar- 
ent difference in protein expression is likely overestimated 
due to their differing membrane transfer efficiency due to 
the large difference in protein size. Moreover, increasing ex- 
pression time significantly compensated for TETl-FL low 
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expression level in terms of 5hmC production (Figure IC), 
and finally, the different regulation patterns of 5hmC dis- 
tribution (Figure 3C-H) cannot be accounted for by differ- 
ing expression levels. Conversely, the failure of TETl-FL to 
induce genome-wide demethylation can be essentially ex- 
plained by its DNA CpG motif binding domain, CXXC, 
which appears to strongly limit the 5hmC production ca- 
pacity and demethylating potential of TETl by guiding it 
toward hypomethylated DNA. 

Previous studies have also shown that Tetl is predom- 
inantly enriched at CpG-rich genomic regions (21,22,25), 
where most CpGs are unmethylated (2,47). We found that 
TETl binds unmethylated but not hypermethylated CGIs 
(Figure 4E and F). Thus, the CXXC domain and the cat- 
alytic domain of TETl form an interesting but conflicting 
domain combination: the CXXC domain specifically tar- 
gets TETl to hypomethylated regions where its substrate 
(5mC) is almost depleted and consequently limits 5hmC 
yield and demethylating activity of the catalytic domain 
in hypermethylated genomic regions. Therefore, at least in 
HEK293T cells, TETl -mediated 5mC oxidation is not used 
for global DNA demethylation but rather prevents methy- 
lation spreading into CGIs. By contrast, the absence of the 
CXXC domain and the binding to both unmethylated and 
hypermethylated CGIs (with a preference for the latter. Fig- 
ure 4D) adequately explain why TETl -CD overexpression 
is able to induce massive global DNA demethylation. It is 
interesting to note that TETl-FL increases 5hmC by DNA 
dot blots and hMeDIP-Seq yet fails to affect DNA methyla- 
tion genome wide. This paradox may be technical in part- 
-it is impossible to relate dot blot 5-hmC levels to actual 
5-hmC amount/methylated CpG site. Considering that the 
basal 5hmC level is extremely low in HEK293 cells (only 
~0.35% of 5mC) (14), doubling or tripling this amount may 
be detectable by an enrichment method such as hMeDIP- 
seq but may not be enough to significantly change global 
DNA methylation level. Also, the hMeDIP-seq data show 
that 5hmC increase with TET-FL is highly concentrated 
at the edges of hypomethylated CpG islands thus affecting 
<1% of all CpG sites, which might not result in significant 
global demethylation. 

We note that the TETl CXXC domain was recently re- 
ported to bind both unmethylated and methylated DNA 
probes in in vitro GST pull-down assays (22,32). Still, even 
in these studies, the binding of the TETl CXXC domain 
to methylated DNA was much lower than that to unmethy- 
lated DNA probes. Furthermore, this detected binding to a 
methylated DNA probe may disappear in vivo, in the com- 
plex chromatin environment in cells. For example, the re- 
cruitment of endogenous methyl-CpG binding proteins to 
methylated CpGs and the densely packed heterochromatin 
state in hypermethylated genomic regions both could dra- 
matically inhibit the access of TETl to methylated CpGs. 
Consistent with this, we were unable to detect binding of 
transfected or endogenous TETl to methylated CGIs in our 
ChlP-qPCR studies. 

Our finding that TETl-FL is unable to induce global 
DNA demethylation in differentiated cells may provide 
some mechanistic insight into the recent studies on the role 
of TETl in embryonic development. Although the conver- 
sion of 5mC to 5hmC underlies global demethylation in 



mouse primordial germ cells where Tetl and Tet2 is highly 
expressed, loss of Tetl and Tet2 does not greatly affect 
this epigenome remodeling (26,28). Moreover, both Tetl- 
knockout and Tetl/Tet2 double-knockout mice are viable, 
fertile and grossly normal, though abnormal imprinting 
of some genes can be detected in a fraction of Tetl /Tet2 
double-knockout embryos (48,49). Thus, the conflicting 
combination of CXXC and catalytic domains may explain 
why Tetl has a limited role in global demethylation in pri- 
mordial germ cells. The subtle changes in DNA methylation 
induced by TETl would suggest that this protein may play 
a greater role in situations where there is a stress on the sys- 
tem and a pressure to methylate CpG islands, such as during 
aging, in inflammatory conditions or in cancer (50-52). 

Our observation that TETl-FL induces specific accu- 
mulation of 5hmC at the edges of hypomethylated CGIs, 
while TETl knockdown induces methylation spreading into 
hypomethylated CGIs also uncovers a DNA demethylase- 
based mechanism for the immunity of CGIs to DNA methy- 
lation, which is one of the most striking feature of DNA 
methylation patterns in mammals (2). By contrast, all the 
other known methylation-protecting factors such as bind- 
ing of transcription factors, high transcription activity, and 
the active chromatin mark H3K4me3, protect CGIs from de 
novo DNA methylation by excluding DNMTs (53). Thus, 
the demethylase-based mechanism may reasonably coop- 
erate with the DNMT-exclusion mechanism to provide a 
more soHd protection for hypomethylated CGIs against 
methylation attack. This finding also suggests a possible in- 
volvement of TETl in the yet unexplained occurrence of 
aberrant CGI hypermethylation in aging and cancer, which 
provides cancer cells with an advantage in cell growth and 
invasion (1,54). Indeed, methylation boundaries around 
CGIs become much less well-defined during cancer forma- 
tion (55), and it would be interesting to see if this is caused in 
part by the frequently detected TETl mutations and down- 
regulation in neoplastic cells (31,56-59). 

One of the paradoxes in the TET protein field has been 
the disconnection between effects on DNA methylation and 
effects on gene expression. Recent studies have suggested 
a complex function of Tetl in gene transcription regula- 
tion in mESCs: Tetl knockdown induces similar upregula- 
tion and downregulation of gene transcription in wild type 
and Dnmt KO mESCs (25) and, by recruiting PRC2 and 
Sin3a to chromatin, Tetl has been proposed to repress gene 
transcription independent of its effects on DNA methyla- 
tion (21,25). More recent studies also suggested that Tetl 
regulates gene transcription through its interaction with O- 
hnked N-acetylglucosamine transferase (60-62). However, 
the lack of a major phenotype in the r^^7 -knockout and 
Tetl/Tet2 double-knockout mice suggest that the effects 
of Tetl on transcription may be relatively minor. Here, by 
comparing the wild type TETl-FL and its catalytically mu- 
tant control, we directly demonstrate that the gene expres- 
sion changes observed after TETl overexpression are inde- 
pendent of its enzymatic and demethylating activity. It re- 
mains to be seen whether these effects are direct or indi- 
rect, and whether they are physiologically relevant. These 
data also have implications for understanding the function 
of TET2 and TET3 as well. In particular, TET2 mutations 
that are common in myeloid leukemias seem to have lit- 
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tie effects on CGI DNA methylation (63). Dissecting DNA 
methylation-dependent and -independent effects of TET2 
will be important to understand its role in cancer develop- 
ment. 

In summary, our study demonstrates that in post- 
development cells TETl works as a maintenance DNA 
demethylase which does not purposely decrease DNA 
methylation levels, but rather specifically prevents de novo 
DNA methylation spreading from methylated CGI edges 
into CGIs using its 5mC dioxygenase catalytic func- 
tion. These findings support a role for DNA demethy- 
lation in maintaining normal DNA methylation patterns 
post-development and have implications for understanding 
methylation deregulation in aging and cancer. 
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